Configuring Databricks Unity Catalog as a Data Lake

Databricks Unity Catalog refers to the specific features of Unity Catalog used for managing and organizing structured, semi-structured, and unstructured data within a data lake environment. It provides robust metadata management and data cataloging functionalities. Metadata includes information about data schema, data lineage, and data quality metrics while cataloging helps users discover, understand, and utilize the data assets effectively in a data lake.

Prerequisites

Before you configure Databricks Unity Catalog as a data lake, ensure that you have the following:

  • A Databricks instance that is enabled for Unity Catalog.

Configuring Databricks Unity Catalog Connection Details

  1. Sign in to the Calibo Accelerate platform and click Configuration in the left navigation pane.
  2. On the Platform Setup screen, on the Cloud Platform, Tools & Technologies tile, click Configure.
  3. On the Cloud Platform, Tools & Technologies screen, in the Databases and Data Warehouses section, click Configure.

(After you save your first connection details in this section, you see the Modify button here.)

Databases and Data Warehouses

  1. In the list of available database and data warehouse options, click .

  2. On the Databricks Unity Catalog screen, do the following:

    1. In the Details section, provide the following details:

      Field Description
      Name Give a unique name to your Databricks Unity Catalog configuration. This name is used to save and identify your specific Databricks Unity Catalog connection details within theCalibo Accelerate platform.
      Description Provide a brief description that helps you identify the purpose or context of this Databricks Unity Catalog configuration.
    2. In the Configuration section, provide the following information:

      Field Description
      Databricks Configuration

      Select a configured Databricks connection name from the dropdown list that is populated.

      Metastore The metastore associated with the selected Databricks configuration is auto-populated.
      Databricks Workspace (Optional)

      Provide the details of a Databricks workspace associated with the metastore. Providing these details is optional at this stage. If you want to provide the details, click Workspace Details and provide the following information:

      • Databricks Account ID

      • Databricks Client ID

      • Databricks Secret

      After you provide the above details, click Test Connection and Fetch Details.

      Note:

      The Calibo Accelerate Orchestrator Agent and the Secrets Manager settings are inherited from the selected Databricks configuration.

      Catalog Select a catalog from the dropdown list. A catalog is the first layer in Unity Catalog's 3-tier namespace (catalog - schema-table),
      Storage Location The configured storage location is displayed. A storage location is an object that combines a cloud storage path with a storage credential that authorizes access to that cloud storage path.
      Temporary File Location (Schema) A list of temporary locations is populated, based on the catalog that you select. This temporary location can be used as schema and checkpoint locations.
    3. Secure configuration details with a password - To password-protect your Databricks Unity Catalog connection details, turn on this toggle, enter a password, and then retype it to confirm. This is optional but recommended. When you share the connection details with multiple users, password protection helps you ensure authorized access to the connection details.
    4. Click Save Configuration.

 

Related Topics Link IconRecommended Topics

What's next? Cloud Platforms, Tools, and Technologies